Method and system for detecting sea-surface oil

ABSTRACT

A behavioral recognition system may include both a computer vision engine and a machine learning engine configured to observe and learn patterns of behavior in video data. Certain embodiments may be configured to detect and evaluate the presence of sea-surface oil on the water surrounding an offshore oil platform. The computer vision engine may be configured to segment image data into detected patches or blobs of surface oil (foreground) present in the field of view of an infrared camera (or cameras). A machine learning engine may evaluate the detected patches of surface oil to learn to distinguish between sea-surface oil incident to the operation of an offshore platform and the appearance of surface oil that should be investigated by platform personnel.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.14/823,771 filed on Aug. 11, 2015, which claims priority to U.S. patentapplication Ser. No. 13/971,027 filed on Aug. 20, 2013, which itselfclaims priority to provisional patent application Ser. No. 61/691,102,filed on Aug. 20, 2012, and which are hereby incorporated by referencein their entirety.

BACKGROUND OF THE INVENTION

Field of the Invention

Embodiments of the invention provide techniques for analyzing a sequenceof video frames. More particularly, embodiments of the invention providea combination of a camera system and a computer vision engine andmachine learning system configured to detect and evaluate the presenceof sea-surface oil, e.g., surrounding an offshore drilling platform.

Description of the Related Art

Some currently available video surveillance systems provide simpleobject recognition capabilities. For example, a video surveillancesystem may be configured to classify a group of pixels (referred to as a“blob”) in a given frame as being a particular object (e.g., a person orvehicle). Once identified, a “blob” may be tracked from frame-to-framein order to follow the “blob” moving through the scene over time, e.g.,a person walking across the field of vision of a video surveillancecamera. Further, such systems may be configured to determine when anobject has engaged in certain predefined behaviors. For example, thesystem may include definitions used to recognize the occurrence of anumber of pre-defined events, e.g., the system may evaluate theappearance of an object classified as depicting a car (a vehicle-appearevent) coming to a stop over a number of frames (a vehicle-stop event).Thereafter, a new foreground object may appear and be classified as aperson (a person-appear event) and the person then walks out of frame (aperson-disappear event). Further, the system may be able to recognizethe combination of the first two events as a “parking-event.”

However, such surveillance systems typically are unable to identify orupdate objects, events, behaviors, or patterns (or classify suchobjects, events, behaviors, etc., as being normal or anomalous) byobserving what happens in the scene over time; instead, such systemsrely on static patterns defined in advance. Thus, in practice, thesesystems rely on predefined definitions for objects and/or behaviors toevaluate a video sequence. Unless the underlying system includes adescription for a particular object or behavior, the system is generallyincapable of recognizing that behavior (or at least instances of thepattern describing the particular object or behavior). More generally,such systems are often unable to identify objects, events, behaviors, orpatterns (or classify such objects, events, behaviors, etc., as beingnormal or anomalous) by observing what happens in the scene over time;instead, such systems rely on static patterns defined in advance.

No currently available video surveillance system is capable of reliablyidentifying sea-surface oil, which can result from operations incidentto the normal operation of an offshore oil platform or oil spills,leaks, etc. Although the optical properties of oil-films in the visible,UV, and IR spectral regions have been studied extensively, a systemdesigned to identify sea-surface oil must address constant variations inthe maritime environment, including changes in illumination angle,transparency, aerosols, haze, cloud cover, and transitions between nightand day. Such variations can produce false-positive and otherwiseerroneous identifications of sea-surface oil.

SUMMARY OF THE INVENTION

One embodiment of the invention includes a method for analyzing a scenedepicted in an input stream of video frames captured by one or morecameras. This method may include, for one or more of the video frames,identifying one or more foreground blobs in the video frame. Eachforeground blob may correspond to one or more contiguous pixels of thevideo frame determined to depict sea-surface oil. This method mayfurther include evaluating the one or more foreground blobs to deriveexpected patterns of observations of sea-surface oil within afield-of-view of the cameras. The input stream of video frames may begenerated by one or more long wavelength infrared (LWIR) cameras.

In a particular embodiment, this method may further include, afterderiving the expected patterns of occurrences of sea-surface oil,receiving a set of foreground blobs identified in a subsequent one ofthe video frames and, upon determining that at least a first one of theforeground blobs does not correspond to at least one of the expectedpatterns of occurrences of sea-surface oil, generating an alert message.

Another embodiment includes a method of analyzing a scene depicted in aninput stream of video frames. This method includes, for one or more ofthe video frames, identifying one or more foreground blobs in the videoframe. Each foreground blob generally corresponds to contiguous pixelsof the video frame determined by a behavior recognition system to depicta patch of sea-surface oil. Further, the behavior recognition system isconfigured to learn to distinguish between foreground objects depictingpatches of sea-surface oil and false positive detections of patches ofsea-surface oil resulting from noise occurring in the one or more videoframes. Upon determining one of the foreground blobs depicts a patch ofsea-surface oil deviates from expected patterns of sea-surface oilderived by the behavior recognition system, an alert message isgenerated. Examples of noise include that result in false-positiveforeground blobs may include lighting, absorption, and extinctionartifacts in the video frames.

BRIEF DESCRIPTION OF THE DRAWINGS

So that the manner in which the above recited features, advantages, andobjects of the present invention are attained and can be understood indetail, a more particular description of the invention, brieflysummarized above, may be had by reference to the embodiments illustratedin the appended drawings.

It is to be noted, however, that the appended drawings illustrate onlytypical embodiments of this invention and are therefore not to beconsidered limiting of its scope, for the invention may admit to otherequally effective embodiments.

FIG. 1 illustrates components of a video analysis system, according toone embodiment of the invention.

FIG. 2 further illustrates components of the video analysis system shownin FIG. 1, according to one embodiment of the invention.

FIG. 3 illustrates a system for generating a synthetic video stream fordetecting sea-surface oil and deriving expected patterns in thesynthetic video stream, according to one embodiment of the invention.

FIG. 4 illustrates spectral radiance contrast between seawater andmodeled surface oil.

FIG. 5 illustrates an exemplary geometry for mounting video cameras onan offshore oil platform, according to one embodiment of the invention.

FIG. 6 illustrates a method for detecting and reporting on anomaloussea-surface oil, according to one embodiment of the invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Embodiments of the present invention provide a method and a system foranalyzing and learning to identify unusual dispersions of oil floatingon a liquid surface. A computer vision engine may be configured toprocess video frames from multiple cameras observing a common region ofsea surface. The computer vision engine may evaluate frames of video todetermine what pixels depict seawater (background) and what pixelsdepict oil floating on the sea surface (foreground). Contiguous regionsof pixels classified as foreground are passed to a machine learningengine, which observes a variety of features of the foreground blobs tolearn expected patterns in the scene and issue an alert when unexpected,anomalous oil patches are observed.

In one embodiment, a multiplexor module is configured to receive videostreams (also referred to herein as “signals”) from three or morelong-wavelength infrared (LWIR) cameras whose output is filtered bydistinct band-pass filters and multiplex the signals to generate asingle synthetic signal whose brightness indicates a match with an IRsignature of sea-surface oil. A computer vision engine determines, fromthe synthetic signal, foreground blobs representing patches ofcontiguous pixels having values indicating a match to the IR signatureof oil, and further extracts features such as position, size, change insize, etc. which are pertinent to sea-surface oil. In turn, a machinelearning engine is configured to build models of certain behaviorswithin the scene based on the foreground blobs and extracted features,and determine whether observations indicate that the behavior of anobject is anomalous or not, relative to the model. In one embodiment,e.g., the machine learning engine may model observed sea-surface oilover time, and determine whether any given foreground blob correspondingto sea-surface oil is unusual or anomalous relative to prior sea-surfaceoil which has been observed. The machine learning engine may issue analert when anomalous sea-surface oil is observed so that the oil may beinvestigated.

In the following, reference is made to embodiments of the invention.However, it should be understood that the invention is not limited toany specifically described embodiment. Instead, any combination of thefollowing features and elements, whether related to differentembodiments or not, is contemplated to implement and practice theinvention. Furthermore, in various embodiments the invention providesnumerous advantages over the prior art. However, although embodiments ofthe invention may achieve advantages over other possible solutionsand/or over the prior art, whether or not a particular advantage isachieved by a given embodiment is not limiting of the invention. Thus,the following aspects, features, embodiments and advantages are merelyillustrative and are not considered elements or limitations of theappended claims except where explicitly recited in a claim(s). Likewise,reference to “the invention” shall not be construed as a generalizationof any inventive subject matter disclosed herein and shall not beconsidered to be an element or limitation of the appended claims exceptwhere explicitly recited in a claim(s).

One embodiment of the invention is implemented as a program product foruse with a computer system. The program(s) of the program productdefines functions of the embodiments (including the methods describedherein) and can be contained on a variety of computer-readable storagemedia. Examples of computer-readable storage media include (i)non-writable storage media (e.g., read-only memory devices within acomputer such as CD-ROM or DVD-ROM disks readable by an optical mediadrive) on which information is permanently stored; (ii) writable storagemedia (e.g., floppy disks within a diskette drive or hard-disk drive) onwhich alterable information is stored. Such computer-readable storagemedia, when carrying computer-readable instructions that direct thefunctions of the present invention, are embodiments of the presentinvention. Other examples media include communications media throughwhich information is conveyed to a computer, such as through a computeror telephone network, including wireless communications networks.

In general, the routines executed to implement the embodiments of theinvention may be part of an operating system or a specific application,component, program, module, object, or sequence of instructions. Thecomputer program of the present invention is comprised typically of amultitude of instructions that will be translated by the native computerinto a machine-readable format and hence executable instructions. Also,programs are comprised of variables and data structures that eitherreside locally to the program or are found in memory or on storagedevices. In addition, various programs described herein may beidentified based upon the application for which they are implemented ina specific embodiment of the invention. However, it should beappreciated that any particular program nomenclature that follows isused merely for convenience, and thus the invention should not belimited to use solely in any specific application identified and/orimplied by such nomenclature.

FIG. 1 illustrates components of a video analysis andbehavior-recognition system 100, according to one embodiment. As shown,the behavior-recognition system 100 includes a video input source 105, anetwork 110, a computer system 115, and input and output devices 118(e.g., a monitor, a keyboard, a mouse, a printer, and the like). Thenetwork 110 may transmit video data recorded by the video input 105 tothe computer system 115. Illustratively, the computer system 115includes a CPU 120, storage 125 (e.g., a disk drive, optical disk drive,floppy disk drive, and the like), and a memory 130 which includes both acomputer vision engine 135 and a machine-learning engine 140. Asdescribed in greater detail below, the computer vision engine 135 andthe machine-learning engine 140 may provide software applicationsconfigured to analyze a sequence of video frames provided by the videoinput 105.

Network 110 receives video data (e.g., video stream(s), video images, orthe like) from the video input source 105. The video input source 105may be a video camera, a VCR, DVR, DVD, computer, web-cam device, or thelike. For example, the video input source 105 may be a stationary videocamera aimed at a certain area (e.g., a subway station, a parking lot, abuilding entry/exit, etc.), which records the events taking placetherein. Generally, the area visible to the camera is referred to as the“scene.” The video input source 105 may be configured to record thescene as a sequence of individual video frames at a specified frame-rate(e.g., 24 frames per second), where each frame includes a fixed numberof pixels (e.g., 320×240). Each pixel of each frame may specify a colorvalue (e.g., an RGB value) or grayscale value (e.g., a radiance valuebetween 0-255). Further, the video stream may be formatted using knownformats including MPEG2, MJPEG, MPEG4, H.263, H.264, and the like.

In one embodiment, video input source 105 may capture infrared spectruminstead of visible light. Further, multiple cameras could be band-passfiltered to capture different wavelength bands within the infraredspectrum. In such a case, images from each camera could be registered toone another, allowing a composite image to be generated from themultiple cameras. As described in greater detail below, by usingmultiple observations of the sea surface in different wavelength bandsof the infrared spectrum, the contrast between oil on seawater may beenhanced, making it more readily detectable to the background foregroundmodule. In one embodiment, the computer vision engine may filter clutterfrom video input source, reducing input images to largely black(background) regions representing seawater and white (foreground)regions representing oil in the field of view to the cameras. In turn,the machine learning engine learns to filter noise from the observationsof sea-surface oil, and generates alerts after observing an unusualappearance (or behavior) of sea-surface oil. Examples of noise includethat result in false-positive foreground blobs (i.e., false positivedetections of patches of sea-surface oil) include lighting, absorption,and extinction artifacts in the video frames.

In one embodiment, the computer vision engine 135 is configured toreceive input from a multiplexor module which multiplexes multiple datachannels. Alternatively, the computer vision engine 135 may itselfinclude the multiplexor module. In one embodiment, the multiplexormodule may process data from three (or more) channels, co-adding imagedata and performing operations on the video streams. Each channel maycorrespond to a camera capturing a different portion of the infraredspectrum. The cameras may be positioned collinear to one another. Thatis, the cameras may each share a substantially identical field of view.Further, the image from each camera may be registered to one another. Asnoted, however, each camera may cover a different band of the infraredspectrum. That is, each camera subsamples a different band of theinfrared spectrum. In one embodiment, each camera is a long wavelengthinfrared (LWIR) camera with configurable filters. The multiplexor modulemay take the video signals from the video sources, combine them, asfurther described herein, and pass the information to the computervision engine 135.

As noted above, the computer vision engine 135 may be configured toanalyze image data (whether in the visible or IR spectrum (orotherwise)) to identify objects in the video stream, identify a varietyof appearance and kinematic features used by a machine learning engine140 to derive object classifications, derive a variety of metadataregarding the actions and interactions of such objects, and supply thisinformation to the machine-learning engine 140. And in turn, themachine-learning engine 140 may be configured to evaluate, observe,learn and remember details regarding events (and types of events) thattranspire within the scene over time.

In one embodiment, the machine-learning engine 140 receives the videoframes and the data generated by the computer vision engine 135. Themachine-learning engine 140 may be configured to analyze the receiveddata, cluster objects having similar visual and/or kinematic features,build semantic representations of events depicted in the video frames.Over time, the machine learning engine 140 learns expected patterns ofbehavior for objects that map to a given cluster. Thus, over time, themachine learning engine learns from these observed patterns to identifynormal and/or abnormal events. That is, rather than having patterns,objects, object types, or activities defined in advance, the machinelearning engine 140 builds its own model of what different object typeshave been observed (e.g., based on clusters of kinematic and orappearance features) as well as a model of expected behavior for a givenobject type. Thereafter, the machine learning engine can decide whetherthe behavior of an observed event is anomalous or not based on priorlearning.

Data describing whether anomalous sea-surface oil has been determinedand/or describing the anomalous sea-surface oil may be provided tooutput devices 118 to issue alerts (e.g., an alert message presented ona GUI interface screen).

In general, the computer vision engine 135 and the machine-learningengine 140 both process video data in real-time. However, time scalesfor processing information by the computer vision engine 135 and themachine-learning engine 140 may differ. For example, in one embodiment,the computer vision engine 135 processes the received video dataframe-by-frame, while the machine-learning engine 140 processes dataevery N-frames. In other words, while the computer vision engine 135 mayanalyze each frame in real-time to derive a set of appearance andkinematic data related to objects observed in the frame, themachine-learning engine 140 is not constrained by the real-time framerate of the video input.

Note, however, FIG. 1 illustrates merely one possible arrangement of thebehavior-recognition system 100. For example, although the video inputsource 105 is shown connected to the computer system 115 via the network110, the network 110 is not always present or needed (e.g., the videoinput source 105 may be directly connected to the computer system 115).Further, various components and modules of the behavior-recognitionsystem 100 may be implemented in other systems. For example, in oneembodiment, the computer vision engine 135 may be implemented as a partof a video input device (e.g., as a firmware component wired directlyinto a video camera). In such a case, the output of the video camera maybe provided to the machine-learning engine 140 for analysis. Similarly,the output from the computer vision engine 135 and machine-learningengine 140 may be supplied over computer network 110 to other computersystems. For example, the computer vision engine 135 andmachine-learning engine 140 may be installed on a server system andconfigured to process video from multiple input sources (i.e., frommultiple cameras). In such a case, a client application 250 running onanother computer system may request (or receive) the results of overnetwork 110.

FIG. 2 further illustrates components of the computer vision engine 135and the machine-learning engine 140 first illustrated in FIG. 1,according to one embodiment of the invention. As shown, the computervision engine 135 includes a background/foreground (BG/FG) component205, a tracker component 210, an estimator/identifier component 215, anda context processor component 220. Collectively, the components 205,210, 215, and 220 provide a pipeline for processing an incoming sequenceof video frames supplied by the video input source 105 (indicated by thesolid arrows linking the components). Additionally, the output of onecomponent may be provided to multiple stages of the component pipeline(as indicated by the dashed arrows) as well as to the machine-learningengine 140. In one embodiment, the components 205, 210, 215, and 220 mayeach provide a software module configured to provide the functionsdescribed herein. Of course one of ordinary skill in the art willrecognize that the components 205, 210, 215, and 220 may be combined (orfurther subdivided) to suit the needs of a particular case and furtherthat additional components may be added (or some may be removed) from avideo surveillance system.

In one embodiment, the BG/FG component 205 may be configured to separateeach frame of video provided by the video input source 105 into a staticpart (the scene background) and a collection of volatile parts (thescene foreground). The frame itself may include a two-dimensional arrayof pixel values for multiple channels (e.g., RGB channels for colorvideo or grayscale channel or radiance channel for black and whitevideo). In one embodiment, the BG/FG component 205 may model backgroundstates for each pixel using an adaptive resonance theory (ART) network.That is, each pixel may be classified as depicting scene foreground orscene background using an ART network modeling a given pixel. Of course,other approaches to distinguish between scene foreground and backgroundmay be used. Again, in context of this discussion, the background maygenerally corresponds to pixels depicting seawater, whereas foregroundmay generally correspond to pixels depicting sea-surface oil.

Additionally, the BG/FG component 205 may be configured to generate amask used to identify which pixels of the scene are classified asdepicting foreground and, conversely, which pixels are classified asdepicting scene background. The BG/FG component 205 then identifiesregions of the scene that contain a portion of scene foreground(referred to as a foreground “blob” or “patch”) and supplies thisinformation to subsequent stages of the pipeline. Additionally, pixelsclassified as depicting scene background may be used to generate abackground image modeling the scene.

In context of detecting and evaluating sea-surface oil, the BG/FGcomponent classifies pixels depicting surface oil as foreground. Thus,the computer vision engine is being used as a “blob” detector/tracker,where blobs of pixels classified as foreground correspond to patches ofsea-surface oil. In such a case, blobs do not need to address occlusionor depth ordering. Instead, blobs that intersect may be merged.

The tracker component 210 may receive the foreground patches produced bythe BG/FG component 205 and generate computational models for thepatches. The tracker component 210 may be configured to use thisinformation, and each successive frame of raw-video, to attempt to trackthe motion of an object depicted by a given foreground patch as it movesabout the scene. That is, the tracker component 210 provides continuityto other elements of the system by tracking a given object fromframe-to-frame.

The estimator/identifier component 215 may receive the output of thetracker component 210 (and the BF/FG component 205) and identify avariety of kinematic and/or appearance features of a foreground object,e.g., size, height, width, and area (in pixels), reflectivity, shininessrigidity, speed velocity, etc.

In context of detecting sea-surface oil, the features of a foregroundobject (a blob of pixels) may include the location and sizes of aforeground blob. Note, the computer vision engine could correct fordistance and the solid angle effects distorting the size of a foregroundobject detected at different areas within the field of view of a camera.Other features of a foreground blob may include rates of change in blobsize and/or a measure of intensity (i.e., how bright the blob is),motion characteristics of the foreground blobs, whether the foregroundblobs have non-sharp edges, whether the foreground blobs have highfractal dimension, and whether the foreground blobs are asymmetrical.

The context processor component 220 may receive the output from otherstages of the pipeline (i.e., the tracked objects, the background andforeground models, and the results of the estimator/identifier component215). Using this information, the context processor 220 may beconfigured to generate a stream of context events regarding objectstracked (by tracker component 210) and evaluated (by estimatoridentifier component 215). For example, the context processor component220 may package a stream of micro-feature vectors and kinematicobservations of an object and output this to the machine-learning engine140, e.g., at a rate of 5 Hz. In one embodiment, the context events arepackaged as a trajectory. As used herein, a trajectory generally refersto a vector packaging the kinematic data of a particular foregroundobject in successive frames or samples. Each element in the trajectoryrepresents the kinematic data captured for that object at a particularpoint in time. Typically, a complete trajectory includes the kinematicdata obtained when an object is first observed in a frame of video alongwith each successive observation of that object up to when it leaves thescene (or becomes stationary to the point of dissolving into the framebackground). Accordingly, assuming computer vision engine 135 isoperating at a rate of 5 Hz, a trajectory for an object is updated every200 milliseconds, until complete.

The computer vision engine 135 may take the output from the components205, 210, 215, and 220 describing the motions and actions of the trackedobjects in the scene and supply this information to the machine-learningengine 140. In context of detecting sea-surface oil, the context eventpackage may include a list of foreground blobs (patches of surface oil)detected by the computer vision engine 135, the size and position ofeach blob, and a trajectory of a blob observed over time. The contextevent package passed to the machine learning engine 140 could alsoinclude any other features of a foreground object detected or generatedby components of the computer vision engine 136, as well as the raw datareceived from the video feeds.

Illustratively, the machine-learning engine 140 includes a long-termmemory 225, a perceptual memory 230, an episodic memory 235, a workspace240, codelets 245, a micro-feature classifier 255, a cluster layer 260and a sequence layer 265. Additionally, the machine-learning engine 140includes a client application 250, allowing the user to interact withthe video surveillance system 100 using a graphical user interface.Further still, the machine-learning engine 140 includes an event bus222. In one embodiment, the components of the computer vision engine 135and machine-learning engine 140 output data to the event bus 222. At thesame time, the components of the machine-learning engine 140 may alsosubscribe to receive different event streams from the event bus 222. Forexample, the micro-feature classifier 255 may subscribe to receive themicro-feature vectors output from the computer vision engine 135.

Generally, the workspace 240 provides a computational engine for themachine-learning engine 140. For example, the workspace 240 may beconfigured to copy information from the perceptual memory 230, retrieverelevant memories from the episodic memory 235 and the long-term memory225, select which codelets 245 to execute. Each codelet 245 may be asoftware program configured to evaluate different sequences of eventsand to determine how one sequence may follow (or otherwise relate to)another (e.g., a finite state machine). More generally, each codelet mayprovide a software module configured to detect interesting patterns fromthe streams of data fed to the machine-learning engine. In turn, thecodelet 245 may create, retrieve, reinforce, or modify memories in theepisodic memory 235 and the long-term memory 225. By repeatedlyscheduling codelets 245 for execution, copying memories and perceptsto/from the workspace 240, the machine-learning engine 140 performs acognitive cycle used to observe, and learn, about patterns of behaviorthat occur within the scene.

In one embodiment, the perceptual memory 230, the episodic memory 235,and the long-term memory 225 are used to identify patterns of behavior,evaluate events that transpire in the scene, and encode and storeobservations. Generally, the perceptual memory 230 receives the outputof the computer vision engine 135 (e.g., the context event stream). Theepisodic memory 235 stores data representing observed events withdetails related to a particular episode, e.g., information describingtime and space details related to an event. That is, the episodic memory235 may encode specific details of a particular event, i.e., “what andwhere” something occurred within a scene, such as a particular vehicle(car A) moved to a location believed to be a parking space (parkingspace 5) at 9:43 AM.

In contrast, the long-term memory 225 may store data generalizing eventsobserved in the scene. To continue with the example of a vehicleparking, the long-term memory 225 may encode information capturingobservations and generalizations learned by an analysis of the behaviorof objects in the scene such as “vehicles in certain areas of the scenetend to be in motion,” “vehicles tend to stop in certain areas of thescene,” etc. Thus, the long-term memory 225 stores observations aboutwhat happens within a scene with much of the particular episodic detailsstripped away. In this way, when a new event occurs, memories from theepisodic memory 235 and the long-term memory 225 may be used to relateand understand a current event, i.e., the new event may be compared withpast experience, leading to both reinforcement, decay, and adjustmentsto the information stored in the long-term memory 225, over time. In aparticular embodiment, the long-term memory 225 may be implemented as anART network and a sparse-distributed memory data structure.

The micro-feature classifier 255 may schedule a codelet 245 to evaluatethe micro-feature vectors output by the computer vision engine 135. Asnoted, the computer vision engine 135 may track objects frame-to-frameand generate micro-feature vectors for each foreground object at a rateof, e.g., 5 Hz. In one embodiment, the micro-feature classifier 255 maybe configured to create clusters from this stream of micro-featurevectors. For example, each micro-feature vector may be supplied to aninput layer of the ART network (or a combination of a self organizingmap (SOM) and ART network used to cluster nodes in the SOM). Inresponse, the ART network maps the micro-feature vector to a cluster inthe ART network and updates that cluster (or creates a new cluster ifthe input micro-feature vector is sufficiently dissimilar to theexisting clusters). Each cluster is presumed to represent a distinctobject type, and objects sharing similar micro-feature vectors (asdetermined using the choice and vigilance parameters of the ART network)may map to the same cluster.

For example, the micro-features associated with observations of manydifferent vehicles may be similar enough to map to the same cluster (orgroup of clusters). At the same time, observations of many differentpeople may map to a different cluster (or group of clusters) than thevehicles cluster. Thus, each distinct cluster in the art networkgenerally represents a distinct type of object acting within the scene.And as new objects enter the scene, new object types may emerge in theART network.

Importantly, however, this approach does not require the differentobject type classifications to be defined in advance; instead, objecttypes emerge over time as distinct clusters in the ART network. In oneembodiment, the micro-feature classifier 255 may assign an object typeidentifier to each cluster, providing a different object type for eachcluster in the ART network.

In an alternative embodiment, rather than generate clusters from themicro-features vector directly, the micro-feature classifier 255 maysupply the micro-feature vectors to a self-organizing map structure(SOM). In such a case, the ART network may cluster nodes of the SOM—andassign an object type identifier to each cluster. In such a case, eachSOM node mapping to the same cluster is presumed to represent aninstance of a common type of object.

As shown, the machine-learning engine 140 also includes a cluster layer260 and a sequence layer 265. The cluster layer 260 may be configured togenerate clusters from the trajectories of objects classified by themicro-feature classifier 255 as being an instance of a common objecttype. In one embodiment, the cluster layer 260 uses a combination of aself-organizing map (SOM) and an ART network to cluster the kinematicdata in the trajectories. Once the trajectories are clustered, thesequence layer 265 may be configured to generate sequences encoding theobserved patterns of behavior represented by the trajectories. And oncegenerated, the sequence layer may identify segments within a sequenceusing a voting experts technique. Further, the sequence layer 265 may beconfigured to identify anomalous segments and sequences.

In context of detecting sea-surface oil, the machine learning engine 140may observe foreground blobs (presumably patches of sea-surface oil)and, over time, identify where patches tend to appear, how frequentlypatches appear, how long a patch remains, how large patches tend to be,etc. And after observing a sea-surface area for a period of time, themachine learning engine 140 may distinguish between (1) patches ofsurface oil that occur incident to the normal operations of an offshoredrilling platform and other spurious oil patches, and (2) patches ofsurface oil that need to be investigated or evaluated by platformpersonnel. That is, given the complexity of a maritime environment, thecomplexity of reflections and spurious light and oil observations in theproximity of boats, ships and offshore platforms, the machine learningengine 140 is used to learn to identify what are “normal” observationsof sea-surface oil and what are “abnormal” or “unusual” observationsthat require investigation.

Detecting Anomalous Sea-Surface Oil in a Machine-Learning VideoAnalytics System

As noted above, a machine-learning video analytics system may beconfigured to use a computer vision engine to observe a scene, generateinformation streams of observed activity, and to pass the streams to amachine learning engine. In turn, the machine learning engine may engagein an undirected and unsupervised learning approach to learn patternsregarding the object behaviors in that scene. Thereafter, whenunexpected (i.e., abnormal or unusual) behavior is observed, alerts maybe generated.

In one embodiment, a multiplexor module is configured to receive videostreams (also referred to herein as “signals”) from three or morelong-wavelength infrared (LWIR) cameras whose output is filtered bydistinct band-pass filters and multiplex the signals to generate asingle synthetic signal whose brightness indicates a match with an IRsignature of sea-surface oil. A computer vision engine determines, fromthe synthetic signal, foreground blobs representing patches ofcontiguous pixels having values indicating a match to the IR signatureof oil, and further extracts features such as position, size, change insize, etc. which are pertinent to sea-surface oil. In turn, a machinelearning engine is configured to build models of behaviors within thescene based on the foreground blobs and extracted features, anddetermine whether observations indicate that the behavior of an objectis anomalous or not, relative to the model. In one embodiment, e.g., themachine learning engine may model observed sea-surface oil over time(including spurious oil), and determine whether any given foregroundblob corresponding to sea-surface oil is unusual or anomalous relativeto prior sea-surface oil which has been observed. The machine learningengine may issue an alert when anomalous sea-surface oil is observed sothat the oil may be investigated.

FIG. 3 illustrates a system for generating a synthetic video stream fordetecting sea-surface oil and deriving expected patterns in thesynthetic video stream, according to one embodiment. As shown, thesystem includes LWIR cameras 310-330 which capture the “thermal” part ofthe light spectrum. Captured light from each of the cameras 310-300 isfiltered using a respective spectral band-pass filter to generatefiltered signals in a distinct wavelength band.

According to physics theory, objects made of normal matter (e.g.,electrons, protons, and neutrons) and having finite non-zerotemperatures continuously emit electromagnetic radiation. Depending onthe temperature of a given object, the emitted radiation may mostly beX-rays, ultraviolet light, visible light, infrared light, microwaves, orradio waves. An idealized object called a blackbody that is perfectlyefficient at this process would emit radiation energy as a function ofthe wavelength of the light according to Planck's law:

$\begin{matrix}{{{B_{\lambda}(T)} = \frac{2{{hc}^{2}/\lambda^{5}}}{\left( {^{{{hc}/\lambda}\; {kT}} - 1} \right)}},} & (1)\end{matrix}$

where, h is Planck's constant, c is the speed of light, k is Boltzmann'sconstant, λ is the wavelength of emitted radiation, and T is thetemperature (° K) of the emitting object. Real physical objects are notperfectly efficient radiators of electromagnetic radiation, and emit adifferent distribution of energy than Plank's law given by:

$\begin{matrix}{{{B_{\lambda}\left( {T,\theta} \right)} = {\frac{2{{hc}^{2}/\lambda^{5}}}{\left( {^{{{hc}/\lambda}\; {kT}} - 1} \right)}{\varepsilon_{\lambda}\left( {T,\theta} \right)}}},} & (2)\end{matrix}$

where ε_(A) is the spectral emissivity and will generally be acomplicated function of wavelength, angle, and temperature, as an objectcan radiate more efficiently in some directions and/or colors of lightthan others and this dependence may vary with temperature. A relatedquantity, the spectral reflectivity ρ_(λ), that describes what fractionof energy incident upon an object is reflected back, may likewise be afunction of temperature, angle, and wavelength.

The spectral emissivity and spectral reflectivity of oil and water atdifferent temperatures are well-known. The contrast (i.e., thedifference) between modeled spectral radiances (here, the combination ofemission and reflected radiances) of seawater and oil is shown in FIG.4, which depicts the contrasts |radiance_(oil)−radiance_(sea)| atvarious temperatures from 275-325° K during the daytime 410 and duringthe nighttime 420. Note, the observed ocean will typically be acting asan emissive source of radiation with a temperature somewhere between275° K and 325° K at nighttime, and during the daytime, there will beadditional components to the radiation field corresponding to thereflected sunlight as well. In one embodiment, the contrast, at a giventemperature, between the emitted radiance curves of seawater versus oiland/or reflected radiance of seawater versus oil, or a combination ofthe two, may be used to distinguish between water and oil. For example,oil and seawater may be distinguished using an approach which issensitive to the shape of the curves in FIG. 4.

Returning to FIG. 3, a thermal camera multiplexor module 340 may beconfigured to multiplex the input from cameras 310-330 equipped withband-pass filters, producing signals B₁, B₂, and B₃. Each of signals B₁,B₂, and B₃ may produce a single data-point per image-pixel correspondingto a grayscale brightness of the scene in that particular spectral band.In one embodiment, signal B₁ may be generated using a 8.0-9.0 μmband-pass filter, signal B₂ may be generated using a 8.0-11.5 μmband-pass filter, and signal B₃ may be generated using an 8.0-13.0 μmband-pass filter. In alternative embodiments, other band-pass filtersmay be used, including more (or fewer) than three band-pass filters andband-pass filters for different wavelength ranges. In yet anotherembodiment, vertical polarizing filters may also be used to minimizespecular reflection effects.

The multiplexor module 340 may then compute the difference between theinputs a=B₂−B₁, b=B₃−B₁, c=B₃−B₂ at 342 ₁₋₃ and examine the relativesizes of the contrast values. In one embodiment, given the differencesbetween the inputs a, b, and c, the multiplexor module 340 may generatea synthetic discriminant video stream by taking the ratio

$\frac{{s_{1}\left( {s_{1} + a} \right)}\left( {s_{2} + b} \right)}{\left( {s_{2} + c} \right)^{2}},$

where s₁ and s₂ are constants which normalize the ratio to, e.g., therange [0,1], with 0 being water and 1 being oil. Here, the syntheticvideo stream is directly proportional to the differences a and b whichcorrespond to contrasts between wavelength ranges in which differencebetween radiance from seawater is substantially greater than differencebetween radiance from oil, and inversely proportional to the differencec which corresponds to a contrast between wavelength ranges in whichdifference between radiance from oil is substantially greater thandifference between radiance from seawater. As a result, the syntheticvideo stream is a synthetic video stream which tends to maximize thecontrast between the spectral signatures of water and oil. That is, thesynthetic video stream may be a black-and-white video stream in whichthe brightness of respective image pixels correspond to how closely theIR signature of the pixel matches what would be expected from oil.

As shown, the synthetic discriminant video stream output by themultiplexor module 340 is subsequently input to video analysis system350, which is similar to the video analysis system described inconjunction with FIGS. 1-2. The video analysis system 350 may include acomputer vision engine which determines, from the synthetic videostream, foreground blobs representing patches of contiguous pixelshaving values indicating a match to the IR signature of oil using, e.g.,per-pixel ART networks, as previously discussed. The computer visionengine may also extract features such as locations and sizes offoreground blobs, rates of change in blob size and/or a measure ofintensity (i.e., how bright the blob is), motion characteristics of theforeground blobs, whether the foreground blobs have non-sharp edges,whether the foreground blobs have high fractal dimension, and whetherthe foreground blobs are asymmetrical, etc. which are pertinent tosea-surface oil. The video analysis system 350 may further include amachine learning engine which receives the foreground blobs and featuresextracted by the computer vision engine, and which engages in undirectedand unsupervised learning to discern patterns of object behaviors in thescene of the synthetic discriminant video stream, discussed in greaterdetail below. Thereafter, when unexpected (i.e., abnormal or unusual)sea-surface oil is observed, the machine learning engine may generate analert so that the sea-surface oil may be investigated.

FIG. 5 illustrates an exemplary geometry for mounting video cameras onan offshore oil platform 510, according to one embodiment. As shown, theoil platform 510 includes a mast 515 on which one or more sets of LWIRcameras 520 are mounted, at a height h above a sea surface 500, toobserve the sea surface 500. Each set of LWIR cameras may include threeor more cameras, with the signal of each camera in the set filtered by aband-pass filter for a distinct wavelength range and the filtered signalbeing multiplexed to generate a synthetic discriminant video stream thatis input to a video analysis system. In one embodiment, several sets offixed cameras, each oriented toward a different azimuth, may be used toachieve full 360° azimuthal coverage of the sea surface. In analternative embodiment, a single set of cameras may be configured toperform a continuing guard-tour sweep to achieve full azimuthalcoverage.

Illustratively, the cameras 520 are able to view a segment of the seasurface beginning from near the platform 510 at r₁ and extending out toa distance r₂, which may be, e.g., several kilometers away from theplatform 510. In one embodiment, wide-angle camera lenses may be used toview a relatively large portion of the sea surface. Generally, thehigher the cameras 520 are placed (i.e., the greater h is), the furtheraway the apparent horizon will be, and the further the cameras 520 willbe able to see. However, due to effects from, e.g., sea-surface sprayand aerosols, discrimination of oil from seawater may not be possibleout to the horizon itself. The particular maximum distance and limitingranges may depend on the cameras 520 used, the arrangement of the oilplatform 510, among other things.

FIG. 6 illustrates a method 600 for detecting and reporting on anomaloussea-surface oil, according to one embodiment. As shown, the method 600begins at step 610, where a camera multiplexor module receives videoframes from LWIR cameras with distinct spectral band-pass filters. Inone embodiment, three or more cameras may be used for purposes ofdetecting surface oil, and the band-pass filters may be chosen so as tolet through light in wavelength ranges in which the radiance contrastbetween seawater and surface oil is relatively large. Doing so maypermit the spectral radiance signatures of seawater and surface oil tobe more clearly distinguishable from each other. In a particularembodiment, video frames B₁, B₂, and B₃ may be received, with the B₁signal being filtered by a 8.0-9.0 μm band-pass filter, the B₂ signalbeing filtered by a 8.0-11.5 μm band-pass filter, and the B₃ signalbeing filtered by a 8.0-13.0 μm band-pass filter. Note, the specificbands here are representative values given for illustrative purposes,and the actual bands used may be different in other embodiments.

At step 620, the multiplexor module combines the received frames tocreate a synthetic discriminant video stream with brightnesscorresponding to a match with the IR signature of oil. In oneembodiment, the multiplexor module may compute differences between pairsof received video frames in different wavelength ranges. In such a case,the synthetic video stream may be directly proportional to thedifference(s) which correspond to contrasts between wavelength ranges inwhich difference between radiance from seawater is substantially greaterthan difference between radiance from oil, and inversely proportional todifference(s) which correspond to contrast between wavelength ranges inwhich difference between radiance from oil is substantially greater thandifference between radiance from seawater, or vice versa. Returning tothe example of received signals B₁, B₂, and B₃ discussed above, themultiplexor module may compute the difference between the inputsa=B₂−B₁, b=B₃−B₁, c=B₃−B₂, and generate the discriminant video stream bytaking the ratio

$\frac{{s_{1}\left( {s_{1} + a} \right)}\left( {s_{2} + b} \right)}{\left( {s_{2} + c} \right)^{2}},$

where s₁ and s₂ are constants which normalize the ratio to, e.g., therange [0,1], with 0 being water and 1 being oil. In alternativeembodiments, the particular form of the equation for multiplexing thereceived frames to generate a discriminant signal may be different.

At step 630, a video analysis system analyzes and learns behavioralpatterns in the synthetic video stream. As discussed, a computer visionengine of the video analysis system may separate foreground blobsdepicting oil from background depicting seawater given the syntheticdiscriminant video stream with brightness corresponding to a match withthe IR signature of oil. For example, the computer vision engine maymodel the scene background and select pixels as foreground usingper-pixel ART networks, discussed above. Contiguous regions of pixelsclassified as foreground may eventually be passed to the machinelearning engine.

As discussed, the computer vision engine may also include anestimator/identifier component which identifies kinematic and/orappearance features of foreground objects such as size, height, width,and area (in pixels), reflectivity, shininess rigidity, speed velocity,etc. In one embodiment, features used to determine sea-surface oil mayinclude the locations and sizes of foreground blobs, rates of change inblob size and/or a measure of intensity (i.e., how bright the blob is),motion characteristics of the foreground blobs, whether the foregroundblobs have non-sharp edges, whether the foreground blobs have highfractal dimension, and whether the foreground blobs are asymmetrical.Such features may be particularly relevant to sea-surface oil, assurface oil blobs may tend to be, e.g., irregular in shape and thus havehigh fractal dimension, asymmetrical, lack sharp edges, move in certainways, appear in certain places and have certain sizes, etc.

The foreground blobs and extracted features are provided to a machinelearning engine of the video analysis system, which may observeforeground blobs and, over time, identify where patches tend to appear,how frequently patches appear, how long patches remain (or remainsdepending on where it appeared), how large patches tend to be, andcharacteristics and/or patterns of other features as they tend to appearin the scene. With the observations of the sea-surface area for a periodof time, the machine learning engine may build a model of expectedbehavior in the scene. Doing so permits commonly-occurring and spurioussea-surface oil patches, which may be caused by, e.g., normal operationof the oil platform, lighting artifacts or changes in the maritimeenvironment, etc. to be learned so that alerts are not generated whensuch commonly-occurring false-positive patches are observed. Forexample, using shape, location, or other appearance features, themachine learning engine may automatically learn to classify foregroundblobs by shape, location, and appearance. If an observed object in alater video frame has oil-like characteristics, and is thus extracted bythe computer vision engine as a foreground blob, the machine learningengine may determine, based on the shape, location, or other appearancefeatures of this new foreground blob, whether the blob is shaped,located, appears, etc. like objects which were previously observed.

In one embodiment, the machine learning engine may include a long-termmemory storing data generalizing events observed in the scene, where thelong term memory is implemented as ART network(s) and sparse-distributedmemory data structure(s), discussed above. In such a case, featurevectors may be supplied to an input layer of the ART network (or acombination of a self organizing map (SOM) and ART network used tocluster nodes in the SOM), and the ART network may map the micro-featurevector to a cluster in the ART network and update that cluster (orcreate a new cluster if the input micro-feature vector is sufficientlydissimilar to the existing clusters). Over time, predictable “oil”patches, whether resulting from oil generated incident to normaloperation of the platform or spurious patches resulting from lightingartifacts or changes in the maritime environment, may produce relativelydense ART network clusters. Then, when another “oil” patch having asimilar feature vector is received, the machine learning engine may mapthis “oil” patch to one of the dense clusters and, given such a mapping,identify the patch as “normal.” That is, the system may learn to ignorecommonly-occurring and spurious sea-surface oil patches caused by, e.g.,normal operation of the oil platform, lighting artifacts or changes inthe maritime environment, etc., which may produce relatively dense ARTnetwork clusters.

Additional and further approaches for extracting objects and featuresfrom video frames and learning and reporting on behaviors in a scene arediscussed in, e.g., U.S. Pat. No. 8,126,833, entitled “DetectingAnomalous Events Using a Long-Term Memory in a Video Analysis System”;U.S. Pat. No. 8,131,012, entitled “Behavioral Recognition System”; U.S.Pat. No. 8,167,430, entitled “Unsupervised Learning of TemporalAnomalies for a Video Surveillance System”; U.S. Pat. No. 8,180,105,entitled “Classifier Anomalies for Observed Behaviors in a VideoSurveillance System”; U.S. Pat. No. 8,189,905, entitled “Cognitive Modelfor a Machine-Learning Engine in a Video Analysis System”; U.S. Pat. No.8,218,818, entitled “Foreground Object Tracking”; U.S. Pat. No.8,270,733, entitled “Identifying Anomalous Object Types DuringClassification”; U.S. Pat. No. 8,285,060, entitled “Detecting AnomalousTrajectories in a Video Surveillance System”; U.S. Pat. No. 8,300,924,entitled “Tracker Component for Behavioral Recognition System”; U.S.Pat. No. 8,358,834, entitled “Background Model for Complex and DynamicScenes”; U.S. Pat. No. 8,411,935, entitled “Semantic RepresentationModule of a Machine-Learning Engine in a Video Analysis System”; U.S.Pat. No. 8,416,296, entitled “Mapper Component for Multiple Art Networksin a Video Analysis System”; and U.S. Pat. No. 8,494,222, entitled“Classifier Anomalies for Observed Behaviors in a Video SurveillanceSystem,” which are hereby incorporated by reference in their entirety.

At step 640, video analysis system generates alerts when anomalousbehavior is observed. As discussed, the machine learning engine may,over time, learn to distinguish between observed patches of sea-surfaceoil that occur normally and patches of surface oil that do not, and arethus anomalous. When such an anomalous surface oil patch is observed,the video analysis system may issue to an alert to, e.g., a userinterface, so that the anomalous surface oil patch may be investigated.

Although discussed above with respect to distinguishing oil fromseawater, techniques disclosed herein may be used to distinguish otherobjects having different spectral radiance signatures from one another.In such cases, the radiation need not be infrared light, and may insteadbe X-rays, ultraviolet light, visible light, microwaves, or radio waves,and appropriate cameras and/or filters may be used to capture theradiation. Further, although discussed above with respect to cameras,other devices, such as spectrometers, may be used in lieu of cameras.

Advantageously, techniques disclosed herein permit surface oil to bedistinguished from seawater using input from multiple LWIR cameras whosesignals are band-pass filtered and multiplexed to generate a singlesynthetic discriminant signal. Patterns of behavior in the scene arethen learned so that anomalous sea-surface oil patches, which may resultfrom oil spills or leaks, may be reported while other surface oilpatches from normal operation of the oil platform or spurious patchesfrom changing maritime conditions, etc. are not reported.

While the foregoing is directed to embodiments of the present invention,other and further embodiments of the invention may be devised withoutdeparting from the basic scope thereof, and the scope thereof isdetermined by the claims that follow.

What is claimed is:
 1. A computer-implemented method for analyzing ascene depicted in an input stream of video frames, the methodcomprising: for one or more of the video frames: identifying one or moreforeground blobs in the video frames, wherein each foreground blobcorresponds to one or more contiguous pixels of the video framedetermined to depict sea-surface oil; and evaluating the one or moreforeground blobs to derive expected patterns of observations ofsea-surface oil.